11 research outputs found

    Optimistic distributionally robust optimization for nonparametric likelihood approximation

    Get PDF
    The likelihood function is a fundamental component in Bayesian statistics. However, evaluating the likelihood of an observation is computationally intractable in many applications. In this paper, we propose a non-parametric approximation of the likelihood that identifies a probability measure which lies in the neighborhood of the nominal measure and that maximizes the probability of observing the given sample point. We show that when the neighborhood is constructed by the Kullback-Leibler divergence, by moment conditions or by the Wasserstein distance, then our optimistic likelihood can be determined through the solution of a convex optimization problem, and it admits an analytical expression in particular cases. We also show that the posterior inference problem with our optimistic likelihood approximation enjoys strong theoretical performance guarantees, and it performs competitively in a probabilistic classification task

    Semi-supervised Learning based on Distributionally Robust Optimization

    Full text link
    We propose a novel method for semi-supervised learning (SSL) based on data-driven distributionally robust optimization (DRO) using optimal transport metrics. Our proposed method enhances generalization error by using the unlabeled data to restrict the support of the worst case distribution in our DRO formulation. We enable the implementation of our DRO formulation by proposing a stochastic gradient descent algorithm which allows to easily implement the training procedure. We demonstrate that our Semi-supervised DRO method is able to improve the generalization error over natural supervised procedures and state-of-the-art SSL estimators. Finally, we include a discussion on the large sample behavior of the optimal uncertainty region in the DRO formulation. Our discussion exposes important aspects such as the role of dimension reduction in SSL

    Calculating optimistic likelihoods using (geodesically) convex optimization

    Get PDF
    A fundamental problem arising in many areas of machine learning is the evaluationof the likelihood of a given observation under different nominal distributions.Frequently, these nominal distributions are themselves estimated from data, whichmakes them susceptible to estimation errors. We thus propose to replace eachnominal distribution with an ambiguity set containing all distributions in its vicinityand to evaluate anoptimistic likelihood, that is, the maximum of the likelihoodover all distributions in the ambiguity set. When the proximity of distributionsis quantified by the Fisher-Rao distance or the Kullback-Leibler divergence, theemerging optimistic likelihoods can be computed efficiently using either geodesicor standard convex optimization techniques. We showcase the advantages ofworking with optimistic likelihoods on a classification problem using synthetic aswell as empirical data

    Calculating optimistic likelihoods using (geodesically) convex optimization

    No full text
    33rd Conference on Neural Information Processing Systems (NeurIPS 2019), 8-14 Dec 2019, Vancouver, Canada202305 bcchVersion of RecordSelf-fundedPublishe

    Optimistic distributionally robust optimization for nonparametric likelihood approximation

    No full text
    33rd Conference on Neural Information Processing Systems (NeurIPS 2019), 8-14 Dec 2019, Vancouver, Canada202305 bcchVersion of RecordOthersEPSRCPublishe

    Wasserstein Distributionally Robust Optimization: Theory and Applications in Machine Learning

    No full text
    Many decision problems in science, engineering and economics are affected by uncertain parameters whose distribution is only indirectly observable through samples. The goal of data-driven decision-making is to learn a decision from finitely many training samples that will perform well on unseen test samples. This learning task is difficult even if all training and test samples are drawn from the same distribution - especially if the dimension of the uncertainty is large relative to the training sample size. Wasserstein distributionally robust optimization seeks data-driven decisions that perform well under the most adverse distribution within a certain Wasserstein distance from a nominal distribution constructed from the training samples. In this tutorial we will argue that this approach has many conceptual and computational benefits. Most prominently, the optimal decisions can often be computed by solving tractable convex optimization problems, and they enjoy rigorous out-of-sample and asymptotic consistency guarantees. We will also show that Wasserstein distributionally robust optimization has interesting ramifications for statistical learning and motivates new approaches for fundamental learning tasks such as classification, regression, maximum likelihood estimation or minimum mean square error estimation, among others
    corecore